SRI Submissions to Chinese-English PatentMT NTCIR10

نویسندگان

  • Bing Zhao
  • Jing Zheng
  • Nicolas Scheffer
  • Wen Wang
چکیده

The SRI team joined the subtask of Chinese-English Patent machine translation evaluation, and submitted the transla­ tion results using a combined output from two types of gram­ mars supported in SRlnterp, with two different word seg­ mentations. We investigated the effect of adding sparse fea­ tures, together with several optimization strategies. Also,for the PatentMT domain, we carried out preliminary experi­ ments on adapting language models. Our results showed positive improvements using these approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SRI's Submissions to Chinese-English PatentMT NTCIR10 Evaluation

The SRI team joined the subtask of Chinese-English Patent machine translation evaluation, and submitted the translation results using a combined output from two types of grammars supported in SRInterp [13], with two different word segmentations. We investigated the effect of adding sparse features, together with several optimization strategies. Also,for the PatentMT domain, we carried out preli...

متن کامل

The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10

We describe the statistical machine translation (SMT) systems developed at Heidelberg University for the Chinese-toEnglish and Japanese-to-English PatentMT subtasks at the NTCIR10 workshop. The core system used in both subtasks is a combination of hierarchical phrase-based translation and discriminative training using either large feature sets and `1/`2 regularization (for Japanese-to-English) ...

متن کامل

NTT-NII Statistical Machine Translation for NTCIR-10 PatentMT

This paper describes details of the NTT-NII system in NTCIR10 PatentMT task. The system is an extension of the NTTUT system in NTCIR-9 by: a new English dependency parser (for EJ task), a syntactic rule-based pre-ordering (for JE task), a syntax-based post-ordering (for JE task). Our system ranked 1st in EJ subtask both in automatic and subjective evaluation, and was the best SMT system in JE s...

متن کامل

Using Parallel Corpora to Automatically Generate Training Data for Chinese Segmenters in NTCIR PatentMT Tasks

Chinese texts do not contain spaces as word separators like English and many alphabetic languages. To use Moses to train translation models, we must segment Chinese texts into sequences of Chinese words. Increasingly more software tools for Chinese segmentation are populated on the Internet in recent years. However, some of these tools were trained with general texts, so might not handle domain...

متن کامل

HPB SMT of FRDC Assisted by Paraphrasing for the NTCIR-9 PatentMT

ABSTRACT This paper describes the FRDC machine translation system for the NTCIR-9 PatentMT. The FRDC system JIANZHEN is a hierarchical phrase-based (HPB) translation system. We participated in all the three subtasks, i.e., Chinese to English, Japanese to English and English to Japanese. In this paper, we introduce a novel paraphrasing mechanism to handle a certain kind of Chinese sentences whos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013